( Reaffirmed 2001 ) IS : 8900 - 1978 Indian Standard CRITERIA FOR THE REJECTION OF OUTLYING OBSERVATIONS Quality Control and Industrial Statistics Sectional Representing Committee, EC 3 Chairman DR P. K. BOSE Members SHRI B. ANANTHAKRISHNANAND SHRIR. S. GUPTA ( Alternate ) SHRI M. G. BHADE DIRECTOR DR S. S. PILLAI ( Alternate) SHRI D. DUTTA SARI Y. GHOUSEKHAN SHRI C. RAJANA ( Alternate ) SHRI S. K. GUPTA SHRI A. LAHIRI University of Calcutta, Calcutta Council, New Delhi National Productivity Tata Iron and Steel Co Ltd, Jamshedpur Institute of Agricultural Research Statistics (ICAR), New Delhi The Indian Tube Co Ltd, Jamshedpur NGEF, Bangalore Central Statistical Organization, New Delhi Indian Jute Industries' Research Association, Calcutta SHRI U. DUTTA ( Alternate ) Indian Statistical Institute, Calcutta SHRI P. LAKSHMANAN National Test House, Calcutta SHRI S. MONDOL SHRI S. K. BANERJEII. (Alternate ) Indian Association for Productivity, Quality and DR S. P. MUKHERJEE Reliability, Calcutta SHRI B. H~ATSINGKA ( Alternate ) Army Statistical Organization ( Ministry of DeSHRI Y. P. RAJPUT fence), New Delhi SHRI C. L. VERMA ( Alternate ) Directorate General of Inspection ( Ministry of SHRI RAMESHSHANKER Defence ), New Delhi SHRI N. S. SENGAR ( Alternate ) The South India Textile Research Association, SHRI T. V. RATNAM Coimbatore Defence Research and Development Organization DRD. RAY ( Ministry of Defence ), New Delhi SHRI S. RANGANATHAN ( Alternate ) Indian Institute of Technology, Kharagpur REPRESENTATIVE Tea Board, Calcutta SHRI P. R. SENGUPTA SHRI N. RAMADURAI ( Alternate ) ( Continued on page 2 ) INDIAN STANDARDS INSTITUTION This publication is protected under the Indian Copyright Act (XIV of 1957 ) and reproduction in whole or in part by any means except with written permission of the publisher shall be deemed to be an infringement of copyright under the said Act. IS : 8988 - 1978 (Continuedfrom fiage 1) Rejresenting Steel Authority of India Ltd, New Delhi Directorate General of Supplies and Disposals, New Delhi Director General, IS1 ( Ex-ofjcio Member ) Secretary SHRI Y. K. BHAT Deputy Director (Stat), IS1 Members SHRI S. SUBRAMU SHRI S. N. VOHRA DR B. N. SINGH, Director ( Stat ) Industrial Convener DR P. K. BOSE Statistics Subcommittee, EC 3:7 and Indian Calcutta; Welfare and Business University of Calcutta, Institute of Social Management, Calcutta Members DIRECTOR DR B. B. P. S. GOEL ( Alternate ) SHRI S. K. GUPTA SHRI S. B. PANDEY SHRI Y. P. RAJPUT SHRI C. L. VERMA ( Alternate) DR D. RAY REPRESENTATIVE SHRI B. K. SARKAR SHRI D. R. SEN SHRI K. N. VALI Institute of Agricultural Research Statistics ( ICAR ), New Delhi Central Statistical Organization, New Delhi Imperial Chemical Industries ( India ) Private Ltd, Calcutta Army Statistical Organization ( Ministry of Defence ), New Delhi Defence Research and Development Organization ( Ministry of Defence ), New Delhi Indian Institute of Technology, Kharagpur Indian Statistical Institute, Calcutta Delhi Cloth & General Mills Co Ltd, Delhi National Sample Survey Organization, New Delhi 2 1S t 8900 - 1978 Indian Standard CRITERIA FOR THE REJECTION OF ~OUTLYING OBSERVATIONS 0. FOREWORD 0.1 This Indian Standard was adopted by the Indian Standards Institution on 25 July 1978, after the draft finalized by the Quality Control and Industrial Statistics Sectional Committee had been approved by the Executive Committee. 0.2 An outlying observation or an `outlier' is one that appears to deviate markedly from the other observations of the sample in which it occurs. An outlier may arise merely because of an extreme manifestation of the random variability inherent in the data or because of the non-random errors, such as gross deviation from the prescribed experimental procedure, mistakes in calculations, errors in recording numerical values, other human errors, loss of the calibration of an instrument, change of measuring instruments, etc. If it is known that a mistake has occurred, the outlying observation must be rejected irrespective of its magnitude. If, however, only a suspicion exists, it may be desirable to determine whether such an observatien may be rejected or whether it may be accepted as part of the normal variation expected. 0.3 The procedure consists in testing the statistical significance of the outlier(s). A null hypothesis ( assumption ) is made that all the observations including the suspect observations come from thesame population ( or lot ) as the other observations in the sample. A statistical test is then applied to determine whether this null hypothesis can be rejected at the specified level of significance ( see 2.8 ). If so, the outliers can then be taken to have come from a population(s) different from that of the other observations in the sample and hence the outlier(s) can be rejected. 0.3.1 This standard provides certain statistical criteria which would be useful for the identification of the outlier(s) on an objective basis. When so identified, necessary investigations may also be initiated wherever possible to find *out. the assignable causes which gave_ rise to the .outlier(s). zh~Foother objectlves of the statrstrcal tests for locatmg the outher may e : a) screen the data before statistical analysis; b) sound an alarm that outliers are present, thereby indicating the need for a closer study of the data generating process; and c) pin-point the observations which may be of special interest just because they are extremes. 3 IS : 8900 - 1978 0.4 When a test method recommends more than one determination for reporting the average value of the characteristic under consideration, the precision ( repeatability and reproducibility ) of the test method is found to be useful in detecting observations which deviate unduly from the rest. For further guidance in this regard, reference may be made to IS : 5420 ( Part I )-1969*. When the -precision of the test method is not known quantitatively, one or more of the procedures covered in the standard may be found helpful. The tests outlined in this standard primarily apply to the observations in a single random sample or the experimental data as given by the replicate measures of some property of a given material. 0.5 Although a number of statistical tests based on various considerations are available to screen the given data for outliers, only those tests which are simple and efficient have been included in this standard. 0.5.1 In the case of a single outlier ( the smallest or the largest suspect observation ), two tests are available. One test is based on the standard deviation whereas the other test is based on the ratio of differences between certain order statistics, that is, the observations when they are arranged in the ascending or descending order of magnitude. The latter test would be more useful in those cases where the calculation of standard deviation is to be avoided or a quick judgement is called for. 0.5.2 In the case of two or more outliers at either end of the ordered sample observations, the test given is based on the ratio of the sample sum of squares when the doubtful observations are omitted to the sample sum of squares when the doubtful observations are included, the sample sum of squares being defined as the sum of the squares of the deviations If simplicity in of the observations from the corresponding mean. calculation is the prime requirement then *the test based on the order statistics for the case of single outliers may be used by actually omitting one suspect observation in the sample at a time. However, this test is to be applied with caution because the overall level of significance of the test may change due to its repeated applications. 0.53 In the case of two or more outliers such that one is at least at each end of the ordered sample observations, two tests have been given. One test is based on the ratio of the range to the standard deviation whereas the other test is based on the largest residuals, a residual being defined as the deviation of an observation from the corresponding mean. It may be mentioned that the former test is applicable when only two observations, namely, the largest and the smallest observations, are suspect whereas the latter test is much more general. 0.6 In a given set of data, whenever a large number of observations ( say more than 25 percent ) are found to be outlying, it may be desirable to *Guide on precision of test methods: Part I Principles and applications. 4 IS : 8900 - 1978 discard the entire data. However, this guideline is to be applied with considerable caution in those situations where the data consists of a few observations only. 0.7 Almost all the statistical criteria for the identification of the outliers as given in this standard are based on the assumption that the underlying distribution of the observations is normal. Hence, it is important that these criteria are not used indiscriminately. In case the assumption of normality is in doubt, it is advisable to obtain the guidance of a competent statistician to ascertain the feasibility of the applicability of these test criteria. 0.8 In reporting the result of a test or analysis, if the final value, observed or calculated, is to be rounded off, it shall be done in accordance with IS : 2-1960*. 1. SCOPE 1.1 This standard lays down the criteria observations in the following three situations: a) Single outlier b) Two or more outliers observations ), and ( at either for the detection sample of the of outlying ), ( at either end of the ordered end observations ordered sample c) Two or more outliers ( one at least at each of the two ends of the ordered sample observations ). 1.2 In situations other than those enumerated appropriate to conduct test for non-normality to be covered in a separate standard. in 1.1, it may be more of observations which are 2. TERMINOLOGY 2.0 For the purpose 2.1 2.2 Sample Sample Size of this standard, the following definitions shall apply. -Collection (n) - of items from a lot. Number of items in a sample. of the values of the observations 2.3 Mean ( Arithmetic divided by the number ) ( ?? ) - Sum of observations. 2.4 Standard Deviation ( s) - The square root of the quotient obtained by dividing sum of the squares of deviations of the observations from their mean by one less than the number of observations in the sample. 2.5 Variance Square of the standard deviation. *Rules for rounding off numerical values ( reuised). 5 IS : 8900 - 1978 difference 2.6 Range -The vations in a sample. between the largest and the smallest obser- 2.7 Null Hypothesis - Hypothesis ( or assumption ) that all the observations in the sample come from the same parent population, distribution or lot. 2.8 Level of Significance - The probability ( or risk ) of rejecting the null hypothesis when it is true. Conventionally it is taken to be 5 or 1 percent. 2.9 Statistic A function calculated from the observationsin the sample. 2.10 Critical Value - The value of the appropriate statistic which would be exceeded by chance with some small probability equivalent to the level of significance chosen. 2.11 Degrees values which 3. TESTS of FreedomThe number of independent are necessary to determine a statistic. SINGLE OUTLIER component FOR 3.0 There are many situations when, in a given sample of size n, one of the observations, which is either the largest or the smallest, is suspect. To detect such outliers, two tests have been given. The first one needs the calculation of the average and the standard deviation of the sample observations whereas the second one is based on order statistics and hence is easier to operate. 3.1 Let x1, x2.. . . . .x, be the n sample observations arranged in the ascending order of magnitude so that xl < x2 < ..,...... 057 I.072 PO91 3.104 >I18 1.136 3204 1.36a 3.321 Y364 1.399 1438 I.050 PO66 YO82 )*I00 PllG 1.130 Pl50 I.222 I.283 l.334 I.378 I.417 P450 PO27 PO37 >049 PO64 )_076 1.088 1.104 l.168 I.229 P282 l.324 I.361 I*400 I3.136 I3.168 I3.188 IP262 D*O55 I3,030 0.072 0086 0.099 w115 0.184 0245 0.297 0.342 0.382 0.414 I3044 IDO53 ID-064 I 1.550 I l.599 (I.642 40 :; IPO78 I0144 3.062 I.074 PO88 3'154 I.212 J'264 I.310 ; I.350 3'383 I.056 PO46 I (PO58 (1166 JO66 0.126 0183 0235 0.280 0.320 0.356 :b434 P377 I (X.72 l.696 I.722 ( k484 j.522 l.558 c I.592 ' 0246 ' 0.312 0.364 0408 0444 0483 1 I3,327 I3376 I0.42 I 1 I3.456 I3.490 I3*250 I0.292 I`3.328 IP196 (PI12 ;I.262 Y220 1 2 368 (I.296 1 1.336 14 1s t 8900 - 1978 TABLE 4 CRITICAL OUTLIER ( ONE VALUE OF R/s FOR THE DETECTION OF TWO AT EACH END OF ORDERED RESULTS ) ( Clnuses 5.1 and 5.1.1 ) SAMPLE n .%ZE, SIGNIFICANCE I------- ------5 Percent LEVXM _.&__--_.---_ 1 Percenr 2.00 2.45 2.80 +i+lU 3.34 3.54 3.72 3.88 4.01 413 4'24 4.34 443 451 4.59 4.66 473 *79 , 3 2-00 2.43 2.75 3.01 3.22 3-40 3.55 3.68 3.80 3.91 4-00 4.09 4.17 424 4,.31 4.38 4.43 4.49 + 5 li 7 8 9 10 11 12 13 14 15 16 17 18 19 20 30 40 50 4.89 5.15 535 5.25 5.54 577 15 TABLE 5 CRITICAL VALUES OF Ek FOR THE DETECTION OF k ( SOME SMALL AND OTHERS LARGE) OUTLIERS ( Clauses 5.2 and 5.2.1 k 2 ) -I -. , 1% 5% 3 -F 5 6 I ---.-f----- 4 I . -5% 1% T 5% 7 _- T - 8 9 -- 1 10 5% I 1% 5% 1% 5% 1% 1% 5% 1% 5% 1% O-001 0.000 / 0.010 0.002 O-012 1.004 0034 I.016 O-065 0.028 (I.034 0.099 0050 0.137 0.078 (I.057 0.172 0.101 (I.083 - - - I @OOl 0.006 0.014 0.026 0.037 O-064 0083 0.103 0.123 @ 146 0166 0.188 0.206 0.219 0.236 0.320 0.386 0.435 0.480 0.518 0.550 0.010 0021 0037 0009 O-013 0.004 0.014 0026 0.039 0.053 O-068 0.084 0.102 0.116 0.132 0146 0.163 0236 0.298 o-351 0.395 0.433 0.468 1.020 IPO12 PO31 1.042 I (I.054 j.079 II.068 PO94 I.108 : (l.121 1'188 : I.250 I.299 : I.347 j.386 : I.424 I.018 PO28 kO39 I.052 PO67 PO78 PO91 b105 b119 P186 j.246 i.298 I.343 I.381 v417 0.008 0.014 O-022 0032 0.040 0052 0062 0074 3.086 3146 0,204 3.252 3298 3.336 3.376 I.041 I.050 I.062 PO74 I.085 I.146 I.203 1.254 1297 p337 I.373 l-024 PO32 I.041 I.050 1058 I.110 I.166 1.211 I.258 1.294 1.334 YO24 l.032 PO41 1.050 I.059 l.114 P166 I.214 I.259 I.299 l.334 l.014 1018 I.026 I.032 PO40 I.087 l-132 P132 PI77 l.220 b.257 14 15 16 17 18 19 20 25 30 35 40 45 0204 0234 0.262 0.293 0.317 0.340 0362 O-382 0.398 0.416 0493 0549 0.596 0.629 0.658 0.684 0134 0.159 O-181 0.207 0.238 0.263 0.290 0.306 0.323 0.339 0.418 0482 0.533 0.574 0.607 0636 1.107 I I.133 (l-156 II.179 P206 J.227 f j.248 (l.267 I.287 : j.302 0055 0030 0.073 0.042 0092 0.056 cl.112 0072 0.134 0.090 PI53 0.170 3.187 P203 3,221 3.298 3.364 3.417 3.458 3492 a.529 O-107 O-122 0141 0.156 0.170 0.245 0308 0364 0.408 0.446 0482 PO26 PO33 l-041 I.089 I.137 P181 1,223 I.263 I.299 PO14 3020 3.026 3066 I.108 Y149 Y190 PO28 PO68 l.112 l.154 I.195 Y233 >2G8 0.050 O-087 0.124 0.164 O-200 0.235 50 b381 : j.443 P495 : j.534 P567 : ). 599 1.228 .264 16